An Introduction to Applied Multivariate Analysis with R (Use R!) by Brian Everitt
Author:Brian Everitt [Everitt, Brian]
Language: eng
Format: mobi
Publisher: Springer
Published: 2013-04-10T14:00:00+00:00
168
6 Cluster Analysis
Fig. 6.2. Inter-cluster distance measures.
Fig. 6.3. Darwin’s Tree of Life.
6.3 Agglomerative hierarchical clustering
169
R> (dm <- dist(measure[, c("chest", "waist", "hips")]))
1
2
3
4
5
6
7
8
9
10
2
6.16
3
5.66
2.45
4
7.87
2.45
4.69
5
4.24
5.10
3.16
7.48
6
11.00
6.08
5.74
7.14
7.68
7
12.04
5.92
7.00
5.00 10.05
5.10
8
8.94
3.74
4.00
3.74
7.07
5.74
4.12
9
7.81
3.61
2.24
5.39
4.58
3.74
5.83
3.61
10 10.10
4.47
4.69
5.10
7.35
2.24
3.32
3.74
3.00
11
7.00
8.31
6.40
9.85
5.74 11.05 12.08
8.06
7.48 10.25
12
7.35
7.07
5.48
8.25
6.00
9.95 10.25
6.16
6.40
8.83
13
7.81
8.54
7.28
9.43
7.55 12.08 11.92
7.81
8.49 10.82
14
8.31 11.18
9.64 12.45
8.66 14.70 15.30 11.18 11.05 13.75
15
7.48
6.16
4.90
7.07
6.16
9.22
9.00
4.90
5.74
7.87
16
7.07
6.00
4.24
7.35
5.10
8.54
9.11
5.10
5.00
7.48
17
7.81
7.68
6.71
8.31
7.55 11.40 10.77
6.71
7.87
9.95
18
6.71
6.08
4.58
7.28
5.39
9.27
9.49
5.39
5.66
8.06
19
9.17
5.10
4.47
5.48
7.07
6.71
5.74
2.00
4.12
5.10
20
7.68
9.43
7.68 10.82
7.00 12.41 13.19
9.11
8.83 11.53
11
12
13
14
15
16
17
18
19
2
3
4
5
6
7
8
9
10
11
12
2.24
13
2.83
2.24
14
3.74
5.20
3.74
15
3.61
1.41
3.00
6.40
16
3.00
1.41
3.61
6.40
1.41
17
3.74
2.24
1.41
5.10
2.24
3.32
18
2.83
1.00
2.83
5.83
1.00
1.00
2.45
19
6.71
4.69
6.40
9.85
3.46
3.74
5.39
4.12
20
1.41
3.00
2.45
2.45
4.36
4.12
3.74
3.74
7.68
Application of each of the three clustering methods described earlier to
the distance matrix and a plot of the corresponding dendrogram are achieved
using the hclust() function:
170
6 Cluster Analysis
R> plot(cs <- hclust(dm, method = "single"))
R> plot(cc <- hclust(dm, method = "complete"))
R> plot(ca <- hclust(dm, method = "average"))
The resulting plots (for single, complete, and average linkage) are given in the
upper part of Figure 6.4.
Single
Complete
Average
15
8
4.0
1
1
6
3.0
10
7 5
4
Height
2.0
4 2
Height
5
Height
7
6
14
5
10 3 9 8
2
19
7
1 5
14
2 4
1.0
14
6 3 9 8
13 17
11 20
0
6 2 4
10
19
10 8 3 9
19
0
13 17 11 20
16 15 12 18
16 15 12 18 13 17 11 20
16 15 12 18
dm
dm
dm
hclust (*, "single")
hclust (*, "complete")
hclust (*, "average")
2
1
1
2
2
2
1
2
2
2
1
2
2
2
1
2
2
1
1
2
2
22
2
2 2
1
11
1
2 2
2
11
1
PC2
22
2
2
2
2
2
22
PC2
2 2
1
2
11
PC2
2 2
1
2
11
−4
−4
−4
−4
0
4
−4
0
4
−4
0
4
PC1
PC1
PC1
Fig. 6.4. Cluster solutions for measure data. The top row gives the cluster dendro-
grams along with the cutoff used to derive the classes presented (in the space of the
first two principal components) in the bottom row.
We now need to consider how we select specific partitions of the data (i.e.,
a solution with a particular number of groups) from these dendrograms. The
answer is that we “cut” the dendrogram at some height and this will give a
partition with a particular number of groups. How do we choose where to cut
or, in other words, how do we decide on a particular number of groups that is,
in some sense, optimal for the data? This is a more difficult question to answer.
6.3 Agglomerative hierarchical clustering
171
One informal approach is to examine the sizes of the changes in height in the
dendrogram and take a “large” change to indicate the appropriate number of
clusters for the data. (More formal approaches are described in Everitt et al.
2011) Even using this informal approach on the dendrograms in Figure 6.4, it is not easy to decide where to “cut”.
So instead, because we know that these data consist of measurements on
ten men and ten women, we will look at the two-group solutions from each
method that are obtained by cutting the dendrograms at suitable heights. We
can display and compare the three solutions graphically by plotting the first
two principal component scores of the data, labelling the points to identify
the cluster solution of one of the methods by using the following code:
R> body_pc <- princomp(dm, cor = TRUE)
R> xlim <- range(body_pc$scores[,1])
R> plot(body_pc$scores[,1:2], type = "n",
+
xlim = xlim, ylim = xlim)
R> lab <- cutree(cs, h = 3.8)
R> text(body_pc$scores[,1:2], labels = lab, cex = 0.6)
The resulting plots are shown in the lower part of Figure 6.4. The plots of
dendrograms and principal components scatterplots are combined into a single
diagram using the layout() function (see the chapter demo for the complete
R code). The plot associated with the single linkage solution immediately
demonstrates one of the problems with using this method in practise, and
that is a phenomenon known as chaining, which refers to the tendency to
incorporate intermediate points between clusters into an existing cluster rather
than initiating a new one.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Modelling of Convective Heat and Mass Transfer in Rotating Flows by Igor V. Shevchuk(6226)
Weapons of Math Destruction by Cathy O'Neil(5833)
Factfulness: Ten Reasons We're Wrong About the World – and Why Things Are Better Than You Think by Hans Rosling(4489)
Descartes' Error by Antonio Damasio(3166)
A Mind For Numbers: How to Excel at Math and Science (Even If You Flunked Algebra) by Barbara Oakley(3104)
Factfulness_Ten Reasons We're Wrong About the World_and Why Things Are Better Than You Think by Hans Rosling(3046)
TCP IP by Todd Lammle(3013)
Applied Predictive Modeling by Max Kuhn & Kjell Johnson(2908)
Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets by Nassim Nicholas Taleb(2861)
The Tyranny of Metrics by Jerry Z. Muller(2849)
The Book of Numbers by Peter Bentley(2779)
The Great Unknown by Marcus du Sautoy(2538)
Once Upon an Algorithm by Martin Erwig(2474)
Easy Algebra Step-by-Step by Sandra Luna McCune(2469)
Lady Luck by Kristen Ashley(2416)
Practical Guide To Principal Component Methods in R (Multivariate Analysis Book 2) by Alboukadel Kassambara(2380)
Police Exams Prep 2018-2019 by Kaplan Test Prep(2358)
All Things Reconsidered by Bill Thompson III(2261)
Linear Time-Invariant Systems, Behaviors and Modules by Ulrich Oberst & Martin Scheicher & Ingrid Scheicher(2233)
